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Abstract 



This chapter reviews measures of emergence, self-organization, complexity, home- 
ostasis, and autopoiesis based on information theory. These measures are derived from 
S^ proposed axioms and tested in two case studies: random Boolean networks and an 

Arctic lake ecosystem. 

Emergence is defined as the information produced by a system or process. Self- 
organization is defined as the opposite of emergence, while complexity is defined as the 
balance between emergence and self-organization. Homeostasis reflects the stability of 
a system. Autopoiesis is defined as the ratio between the complexity of a system and 
the complexity of its environment. The proposed measures can be applied at different 
scales, which can be studied with multi-scale profiles. 

1 Introduction 

In recent decades, the scientific study of complex systems (Bar- Yam, 1997; Mitchell, 2009) 
has demanded a paradigm shift in our worldviews (Gershenson et al., 2007; Heylighen et al.. 



2007). Traditionally, science has been reductionistic. Still, complexity occurs when compo- 
nents are difficult to separate, due to relevant interactions. These interactions are relevant 
because they generate novel information which determines the future of systems. This fact 
has several implications (Gcrshcnson, 2013). A key implication: reductionism — the most 
popular approach in science — is not appropriate for studying complex systems, as it at- 
tempts to simplify and separate in order to predict. Novel information generated by novel 
information limits prediction, as it is not included in initial or boundary conditions. It im- 
plies computational irreducibility (Wolfram, 2002), i.e. one has to reach a certain state before 
knowing it will be reached. In other words, a priori assumptions are of limited use, since 
the precise future of complex systems is known only a posteriori. This does not imply that 
the future is random, it just implies that the degree to which the future can be predicted is 
inherently limited. 

It can be said that this novel information is emergent, since it is not in the components, 
but produced by their interactions. Interactions can also be used by components to self- 
organize, i.e. produce a global pattern from local dynamics. Interactions are also key for 
feedback control loops, which help systems regulate their internal states, an essential aspect 
of living systems. 

We can see that reductionism is limited for describing such concepts as complexity, emer- 
gence, self-organization, and life. In the wake of the fall of reductionism as a dominant world- 
view (Morin, 2007), a plethora of definitions, notions, and measures of these concepts has 
been proposed. Still, their diversity seems to have created more confusion than knowledge. 
In this chapter, we revise a proposal to ground measures of these concepts in information 
theory. This approach has several advantages: 

• Measures are precise and formal. 

• Measures are simple enough to be used and understood by people without a strong 
mathematical background. 

• Measures can help clarify the meaning of the concepts they describe. 

• Measures can be applied to any phenomenon, as anything can be described in terms 
of information (Gcrshenson, 2012b). 

This chapter is organized as follows: In the next section, background concepts are pre- 
sented, covering briefly complexity, emergence, self-organization, homeostasis, autopoiesis, 
information theory, random Boolean networks, and limnology. Section 3 presents axioms and 
derives measures for emergence, self-organization, complexity, homeostasis and autopoiesis. 
To illustrate the measures, these are applied to two case studies in Section 4: random Boolean 
networks and an Arctic lake ecosystem. Discussion and conclusions close the chapter. 

2 Background 

2.1 Complexity 

There are dozens of notions and measures of complexity, proposed in different areas with 
different purposes (Edmonds, 1999; Lloyd, 2001). Etymologically, complexity comes from 



the Latin plexus, which means interwoven. Thus, something complex is difficult to separate. 
This means that its components are interdependent, i.e. their future is partly determined by 
their interactions. Thus, studying the components in isolation — as reductionistic approaches 
attempt — is not sufficient to describe the dynamics of complex systems. 

Nevertheless, it would be useful to have global measures of complexity, just as tempera- 
ture characterizes the properties of kinetic energy of molecules or photons. Each component 
can have a different kinetic energy, but the statistical average is represented in the temper- 
ature. For complex systems, particular interactions between components can be different, 
but we can say that complexity should measures represent the type of interactions between 
components. 

A useful measure of complexity should enable us to answer questions such as: Is a desert 
more or less complex than a tundra? What is the complexity of different influenza outbreaks? 
Which organisms are more complex: predators or preys; parasites or hosts; individual or 
social? What is the complexity of different music genres? What is the required complexity 
of a company to face the complexity of a market^? 

Moreover, with the current scandalous increase of data availability, we urgently need 
measures to make sense of it. 

2.2 Emergence 

Emergence has probably been one of the most misused concepts in recent decades. The 
reasons for this misuse are varied and include: polysemy (multiple meanings), buzzwording, 
confusion, hand waving, Platonism, and even mysticism. Still, the concept of emergence 
can be clearly defined and understood (Anderson, 1972). The properties of a system are 
emergent if they are not present in their components. In other words, global properties 
which are produced by local interactions are emergent. For example, the temperature of a 
gas can be said to be emergent (Shalizi, 2001), since the molecules do not possess such a 
property: it is a property of the collective. 

Some might perceive difficulties in describing phenomena at different scales (Gershen- 
son, 2013), but this is a consequence of attempting to find a single "true" description of 
phenomena. Phenomena do not depend on the descriptions we have of them, and we can 
have several different descriptions of the same phenomenon. It is more informative to handle 
several descriptions at once, and actually it is a necessity when studying emergence and 
complex systems. 

2.3 Self-organization 

Self-organization has been used to describe swarms, flocks, traffic, and many other systems 
where the local interactions lead to a global pattern or behavior (Camazine et al., 2003; 
Gershenson, 2007). Intuitively, self-organization implies that a system increases its own 
organization. This leads to the problems of defining organization, system, and self. Moreover, 
as Ashby showed (1947b), almost any dynamical system can be seen as self-organizing: if it 
has an attractor, and we decide to call that attractor "organized", then the system dynamics 



^This question is related to the law of requisite variety (Ashby, 1956). 



will tend to it, thus increasing by itself its own organization. If we can describe almost any 
system as self-organizing, the question is not whether a system is self-organizing or not, but 
rather, when is it useful to describe a system as self-organizing (Gershenson and Heylighen, 
2003)? 

In any case, it is convenient to have a measure of self-organization which can capture at 
the global scale the local dynamics. This is especially relevant for the nascent field of guided 
self-organization (GSO) (Prokopenko, 2009; Ay et al., 2012). GSO can be described as the 
steering of the self- organizing dynamics of a system towards a desired configuration (Ger- 
shenson, 2012a). This desired configuration will not always be the natural attractor of a 
controlled system. The mechanisms for guiding the dynamics and the design of such mech- 
anisms will benefit from measures characterizing the dynamics of systems in a precise and 
concise way. 

2.4 Homeostasis 

Originally, the concept of homeostasis was developed to describe internal and physiological 
regulation of bodily functions, such as temperature or glucose levels. Probably the first 
person to recognize the internal maintenance of a near-constant environment as a condition 
for life was Bernard (1859). Subsequently, Canon (1932) coined the term homeostasis from 
the Greek homoios (similar) and stasis (standing still). Cannon defined homeostasis as the 
ability of an organism to maintain steady states of operation, in view of the internal and 
external changes. Homeostasis does not imply an immobile or a stagnant state. Although 
some conditions may vary, the main properties of an organism are maintained. 

Later, the British cybernectician William R. Ashby proposed, in an alternative form, 
that homeostasis implicates an adaptive reaction to maintain "essential variables" within a 
range (Ashby, 1947a, 1960). In order to explain the generation of behavior and learning in 
machines and living systems, Ashby also contributed by linking the concepts of ultrastability 
and homeostatic adaptation (Di Paolo, 2000). Ultrastability refers to the normal operation 
of the system within a "viability zone" to deal with environmental changes. This viability 
zone is defined by the lower and upper bounds of the essential variables. If the value of 
variables crosses the limits of its viability zone, the system has a chance of finding new 
parameters that make the challenged variables return to their viability zone. 

A dynamical system has a high homeostatic capacity if it is able to maintain its dynamics 
close to a certain state or states (attractors). As explained above, when perturbations or 
environmental changes occur, the system adapts to face the changes within the viability 
zone, that is, without the system "breaking" (Ashby, 1947a). Homeostasis can be seen as 
a dynamic process of self-regulation and adaptation by which systems adapt their behavior 
over time (Williams, 2006). The homeostasis concept can be applied to different fields beyond 
life sciences and is also closely related to self-organization and to robustness (Wagner, 2005; 
Jen, 2005). 

2.5 Autopoiesis 

Autopoiesis comes from the Greek auto (self) and poiesis (creation, production) and was 
proposed as a concept to define the living. According to Maturana (2011), the notion of 



autopoiesis was created to connote and describe the molecular processes taking place in 
the realization of the living beings as autonomous entities in this world. However, the 
word autopoiesis as the name of the organization of living systems as discrete autonomous 
entities, which existed as closed networks of molecular production, was chosen only until 
1970 (Maturana and Varela, 1980). This notion arises from a series of questions, related to 
the internal dynamics of living systems, which Maturana starting considering in the 1960s, 
such as: "What should be the constitution of a system so that I see a living system as a 
result of its operation?", "What kind of systems or entities are living systems?", and another 
question that a student asked Maturana: "What happened three billion eight hundred million 
years ago so that you can now say that living systems began then?" 

In the context autopoiesis, living beings occur as discrete autonomous dynamic molecular 
autopoietic entities. These entities are in a continuous realization of their self-production as 
molecular autopoietic systems. Thus, autopoiesis describes the internal dynamics of a living 
system in the molecular domain. Maturana notices that living beings are dynamical systems 
in continuous change. Interactions between elements of an autopoietic system regulate the 
production and regeneration of the system's components, having the potential to develop, 
preserve, and produce their own organization (Varela et al., 1974). 

The concept of autopoiesis has been extended to other areas beyond biology (Luisi, 2003; 
Seidl, 2004; Frocsc and Stewart, 2010). 

2.6 Information Theory 

Information has had a most interesting history (Glcick, 2011). Information theory was 
created by Claude Shannon in 1948 in the context of telecommunications. He analyzed 
whether it was possible or not to reconstruct data transmitted across a noisy channel. In his 
model, information is represented as a string X = xqXi... where each Xj is a symbol from 
a finite set of symbols A called the alphabet. Moreover, each symbol in the alphabet has 
a given probability P{x) of occurring in the string. Common symbols will have a high P{x) 
while infrequent symbols will have a low P{x). 

Shannon was interested in a function to measure how much information was "produced" 
by a process. Quoting Shannon (1948)^: 

Suppose we have a set of possible events whose probabilities of occurrence are 
Pi,P2, ■■■,Pn- These probabilities are known but that is all we know about the 
event that might occur. Can we find a measure of how much "choice" is involved 
in the selection of the event or how uncertain we are of the outcome? If there is 
such a measure, say {pi,p2, ...,Pn) it is reasonable to require of it the following 
properties: 

1. / should be continuous in each pi. 

2. If all the pi are equal, pi = 1/n, then / should be a monotonic increas- 
ing function of n. With equally n likely events there is more choice, or 
uncertainty, when there are more possible events. 



^We replaced Shannon's H for / 



3. If a choice be broken down into two successive choices, the original / should 
be the weighted sum of the individual values of /. 

With this few axioms, Shannon demonstrates that the only function / satisfying the three 
above is of the form: 

n 

I = -K^^Pilogpi, (1) 

where K is a positive constant. 

For example, if we have a string '0001000100010001...', we can estimate -P(O) = 0.75 and 
P(l) = 0.25, then I = -(0.75 ■ log 0.75 + 0.25 ■ log 0.25). If we use K = 1 and a base 2 
logarithm, then / ^ 0.811. 

Shannon used H to describe information (we are using J) because he was thinking in the 
Boltzmann's H theorem^ when he developed the theory. Therefore, he called equation 1 the 
entropy of the set of probabilities Pi,P2, ■■■,Pn- In modern words, / is a function of a random 
variable X. 

The unit of information is the bit {binaxj digii). One bit represents the information 
gained when a binary random variable becomes known. However, since equation 1 is a sum 
of probabilities. Shannon's information is a unitless measure. 

More details about information theory in general can be found in Ash (1990), while 
a primer on information theory related to complexity, self-organization, and emergence is 
found in Prokopenko et al. (2009). 

2.7 Random Boolean Networks 

Random Boolean networks (RBNs) are abstract computational models, originally proposed 
to study genetic regulatory networks (Kauffman, 1969, 1993). However, being general mod- 
els, their study and use has gone beyond biology (Aldana-Gonzalez et al., 2003; Gershenson, 
2004, 2012a). 

A RBN is formed by A^ nodes linked by K connections^. Each node has a Boolean state, 
i.e. zero or one. The future state of each node is determined by the current states of the 
nodes that link to it and a lookup table which specifies how the update will take place. The 
connectivity (which nodes affect which) and the lookup tables (how nodes affect their states) 
are usually generated randomly for a network, but remain fixed during its dynamics. 

RBNs have been found to have three different dynamical regimes, which have been studied 
extensively (Gershenson, 2004): 

Ordered. Most nodes are static, RBNs are robust to perturbations. 

Chaotic. Most nodes are changing, RBNs are fragile to perturbations. 

Critical. Some nodes are changing, RBNs have adaptive potential. 



■^The Boltzmann H theorem is given in the therniodinamic context. It states that the entropy of an ideal 
gas increases in an irreversible process. This might be also the reason why he required the second property. 
^This K is different from the constant used in equation 1. 



Different parameters and properties determine the regime, which can be used to guide a 
particular RBN towards a desired regime (Gershenson, 2012a). 

It can be said that the critical regime balances the robustness of the chaotic regime and 
the changeability of the chaotic regime. It has been argued that computation and life require 
this balance to be able to compute and adapt (Langton, 1990; Kauffman, 1993). 

RBNs will be used in Section 4.1 to illustrate the measures proposed in the next section. 

2.8 Limnology 

Lakes are studied by limnology. Lakes can be divided in different zones, as shown in Figure 1: 
(i) The macrophyte zone, composed mainly of aquatic plants, which are rooted, floating or 
submerged, (ii) The planktonic zone corresponds to the open surface waters; away from the 
shore in which organisms without self-movement live (phyto and zooplankton) . (iii) The 
benthic zone is the lowest level of a body of water related with the substratum, including 
the sediment surface and subsurface layers, (iv) The mixing zone is where the exchange of 
water from planktonic and benthic zones occurs. 




Figure 1: Zones of lakes studied in limnology. 



At different zones, one or more components or subsystems can be an assessment for 
the ecosystem dynamics. For our case study to be presented in Section 4.2, we considered 
three components: physiochemical, limiting nutrients and photosynthetic biomass for the 
planktonic and benthic zones. 
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The physiochemical component refers to the chemical composition of water. It is affected 
by various conditions and processes such as geological nature, the water cycle, dispersion, 
dilution, solute and solids generation (e.g. photosynthesis), and sedimentation. In this 
component, we highlight two water variables that are important for the aquatic life: (i) the 
pH equilibrium that affects, among others, the interchange of elements between the organism 
and its environment and (ii) the temperature regulation that is supported in the specific heat 
of the water. 

Related to the physiochemical component, limiting nutrients which are basic for photo- 
synthesis are associated with the biogeochemical cycles of nitrogen, carbon, and phosphorous. 
These cycles permit the adsorption of gases into the water or the dilution of some limiting 
nutrients. 

In addition, among limnetic biota, photoautotrophic biomass is the basis for the trophic 
web establishment. The term autotrophs is used for organisms that increase their mass 
through the accumulation of proteins which they manufacture, mainly from inorganic rad- 
icals (Stumm, 2004). This type of organisms can be found at the planktonic and benthic 
zones. 

These basic limnology concepts will be useful to follow the case study of an Arctic lake, 
presented in Section 4.2. 

3 Measures 

We have recently proposed earlier versions of the measures proposed in this chapter (Fernandez 
et al., 2012; Gershenson and Fernandez, 2012). The ones presented here are more refined 
and are based on axioms. The benefit of using axioms is that the discussion is not taken so 
much at the level of the measures, but at the level of the presuppositions or the properties 
we want measures to have. 

A comparison of the proposed measures with other previously proposed in the literature 
can be found in Gershenson and Fernandez (2012). It is worth noting that all of the proposed 
measures are unitless. 

3.1 Emergence 

We mentioned that emergence refers to properties of a phenomenon that are present now 
and were not before. If we suppose these properties as non-trivial, we could say it is harder 
now than before to reproduce the phenomenon. Therefore, we can consider the emergence as 
the transit from a process which requires a little amount of information to be described, to a 
process which requires more information to be described'^. In other words, there is emergence 
in a phenomenon when this phenomenon is producing information and, if we recall. Shannon 
proposed a quantity which measures how much information was "produced" by a process. 
Therefore, we will say that the emergence is the same as the Shannon's information I. From 
now on, we will consider the emergence of a process E as the information / and we will 
use the base two logarithm. With this configuration of parameters, the function E has a 
maximum of 1 and a minimum of 0. 



^Note that this transit can occur by a change in the phenomenon or by a change in its description. 



E = I. (2) 

We now revise that the intuitive idea of emergence fulfills the three basic notions (axioms) 
that Shannon used to derive / (Shannon's H). For the continuity axiom, it is expected of 
a measure not to give big jumps when small changes are made. The second axiom will be 
harder to show. It states that if we consider an auxiliary function i which is the / function 
when there are n events with the same probability 1/n then the function i is monotonic 
increasing. If we have the same configuration for emergence, then we could think the process 
to be with equally likelihood in any of n available states. If something happens and now the 
process can be in n + A; equally likely states we can say that the process has had emergence, 
since now we need more information to know in which state the process is. For the third 
axiom, we need to find a way to figure out how is that we can 'split' the process. Lets 
recall that the third property required by Shannon is that if a choice can be broken into 
two different choices, the original / should be the average of the other two /. In a process, 
we can think the choices as a fraction of the process that we are currently observing. For 
this purpose, we can make a partition'' of the domain, in our case, we get two subsets whose 
intersection is the null set and whose union is the full original set. After this, we compute the 
/ function for each. Since we observe two different parts of a process and in each observation 
we get the average^ new information required to describe the (partial) process, then it makes 
sense to take the average of both when observing the full process. 

E, as well as J, is a probabilistic measure. E = 1 means that when any random binary 
variable becomes known, one bit of information emerges. If -E = 0, then no new information 
will emerge, even as random binary variables become "known" (they are known beforehand). 
Again, we emphasize that emergence can take place at the level of a phenomenon observed 
or at the level of the description of the phenomenon observed. Either can produce novel 
information. 

3.1.1 Multiple Scales 

When Shannon defined equation 1, he included K which is a positive constant. This is 
important because we will change the value of K to normalize a measure onto the [0, 1] 
interval. The value of K will depend on the length of the finite alphabet A we use. In the 
particular Boolean case when we have the alphabet A = {0, 1} with length |^| = 2. Then 
the value K = 1 will normalize the measure to the interval [0, 1]. Because of the relevance 
of the binary notation in computer science and other applications, we will often use the 
Boolean alphabet. Nevertheless, we can compute the entropy for alphabets with different 
lengths. We only have to consider the equation 

^=r^' ^3) 



^ We are using the set theory partition, we could have any finite number of partitions where the intersection 
of all of them is the null set and whose union is the original set. 

^When there are more than two subsets in the partition, we can make a weighted average. A sort of 
expectation where the distribution probability is given by the nature of the process. 



where b is the length of the alphabet we use. In this way we will normalize E and measures 
derived from it. 

For example, consider the string in base 4 '0133013301330133...'. We can estimate -P(O) = 
P(l) = 0.25, P(2) = 0, and P(3) = 0.5. Following equation 1, we have / = -i^(0.25 ■ 
log 0.25 + 0.25 ■ log 0.25 + + 0.5 ■ log 0.5). Since b = A, K = ^ = 0.5. Thus, we obtain a 
normalized / = 0.75. 

3.2 Self-organization 

Self-organization has been correlated with an increase in order, i.e. a reduction of en- 
tropy (Gershenson and Heylighen, 2003). If emergence implies an increase of information, 
which is analogous to entropy and disorder, self-organization should be anti-correlated with 
emergence. 

A measure of self-organization S should be a function 5 : S — ^ M (where S = A^) with 
the following properties: 

1. The range of S is the real interval [0, 1] 

2. S{X) = 1 if and only if X is deterministic, i.e. we know beforehand the value of the 
process. 

3. S{X) = if and only if X has a uniform distribution, i.e. any state of the process is 
equally likely. 

4. S{X) has a negative correlation with emergence E. 

We propose as the measure 

S = l-I=l-E (4) 

It is straightforward to check that this function fulfills the axioms stated. Nevertheless it 
is not unique. However, it is the only affine function which fulfills the axioms. For simplicity, 
we propose the use of 4 as a measure of self-organization. 

S = 1 means that there is maximum order, i.e. no new information is produced {I = E = 
0). On the other extreme, S = when there is no order at all, i.e. when any random variable 
becomes known, information is produced/emerges {I = E = 1). When 5 = 1, maximum 
order, dynamics do not produce novel information, so the future is completely known from 
the past. On the other hand, when 5 = 0, minimum order, no past information tells us 
anything about future information. 

3.3 Complexity 

Following Lopez- Ruiz et al. (1995), we can define complexity C as the balance between 
change (chaos) and stability (order). We have just defined such measures: emergence and 
self-organization. The complexity function (7:2—!-]^ should have the following properties: 

1. The range is the real interval [0, 1]. 

2. C = lii and only ii S = E. 
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Figure 2: Emergence E, self-organization S, and complexity C. 

3. C = if and only if ^ = or E = 0. 

It is natural to consider the product of S and / to satisfy the last two requirements. We 
propose: 

C = 4:-E-S. (5) 

Where the constant 4 is added to normalize the measure to [0, 1]. C can also be represented 
in terms of I as: 

C = A-I-{l-I). (6) 



Figure 8 plots the measures proposed so far for different values of P{x). It can be seen 
that E is maximal when P{x) = 0.5 and minimal when P{x) = or P{x) = 1. The opposite 
holds for S: it is minimal when P{x) = 0.5 and maximal when P{x) = or P{x) = 1. C is 
minimal when S or E are minimal, i.e. P{x) = or P{x) = 0.5 or P{x) = 1. C is maximal 
when E = S = 0.5, which occurs when P{x) ^0.11 or P{x) ^ 0.89. 

Shannon information can be seen as a balance of zeros and ones (maximal when P(0) = 
P(l) = 0.5), while C can be seen as a balance of E and S (maximal when E = S = 0.5). 

3.4 Homeostasis 

The previous three measures {E, S, and C) study how single variables change in time. 
To calculate the measures for a system, one can plot the histogram or simply average the 
measures for all variables in a system. For homeostasis H, we are interested on how all 
variables of a system change or not in time. Table 1 shows this difference: E, S, and C 
focusses on time series of variables (columns), while H focusses on states (rows). 
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Table 1: Difference between observing single variables in time (columns) and several variables 
at one time (rows). 





X 


Y 


z 


t 


= m 


-2 


Xni-2 


ym-2 


Zm-2 


t 


= m 


-1 


Xm-1 


Vm-l 


Zm-1 


t 


= m 


-2 


Xm 


Vm 


^rn 



Let X = xiX2X3...Xn represent the state of a system of n variables (i.e. a row in Table 1). 
If the system has a high homeostasis, we would expect that its states do not change too much 
in time. The homeostasis function ilf : S x S — )■ M should have the following properties: 

1. The range is the real interval [0, 1]. 

2. H = 1 ii and only if for states X and X', X = X', i.e. there is no change in time. 

3. H = ii and only if Vi,Xi ^ x\^ i.e. all variables in the system changed. 

A useful function for comparing strings of equal length is the Hamming distance. The 
Hamming distance d measures the percentage of different symbols in two strings X and 
X' . For binary strings, it can be calculated with the XOR function (©). Its normalization 
bounds the Hamming distance to the interval [0, 1]: 



d{X,X') 



Xi W Xj 

IXI 



(7) 



d measures the fraction of different symbols between X and X'. For the Boolean case, 
d = <^=^ X = X' and d = 1 <^==^ X = -iX', while X and X' are uncorrelated 

We can use the inverse of d to define h: 



h{X\X 



t+i> 



l-d{X\X'+^) 



which clearly fulfills the desired properties of homeostasis between two states. 
To measure the homeostasis of a system in time, we can generalize 



jn— 1 



H 



m 



^—j:hx\x'^'] 



(9) 



where m is the total number of time steps being evaluated. H will be simply the average 
of different h from t = to t = m — 1. As well as the previous measures based on J, if is a 
unitless measure. 

When H is measured at higher scales, it can capture periodic dynamics. For example, let 
us have a system with n = 2 variables and a cycle of period 2: 11— j-OO— ^-ll. H for base 2 
will be minimal, since every time step all variables change, i.e. ones turn into zeros or zeros 
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turn into ones. However, if we measure H in base 4, then we will be actually comparing 
pairs of states, since to make one time step in base 4 we take two binary time steps. Thus, 
in base 4 the attractor becomes 22 — )■ 22, and H = 1. The same applies for higher bases. An 
example of the usefulness of measuring H at multiple scales in elementary cellular automata 
is explained in Gershenson and Fernandez (2012). 

3.5 Autopoiesis 

Let X represent the trajectories of the variables of a system and Y represent the trajectories 
of the variables of the environment of the system. A measure of autopoiesis A : S x S — > M 
should have the following properties: 

1. A>0. 

2. A should reflect the independence of X over Y. This implies: 

(a) A > A' <^==^ X produces more of its own information than X' for a given Y. 

(b) A > A' <^==^ X produces more of its own information in Y than in Y'. 

(c) A = A' <^==^ X produces as much of its own information than X' for a given Y. 

(d) A = A' <^==^ X produces as much of its own information in Y than in Y'. 

(e) A = if all of the information in X is produced by Y. 

It is problematic to define in a general and direct way how some information depends 
on other information, as causality can be confounded with co-occurrence. For this reason, 
measures such as mutual information are not suitable for measuring A. 

As it has been proposed, adaptive systems require a high C in order to be able to cope 
with changes of its environment while at the same time maintaining their integrity (Langton, 
1990; Kauffman, 1993). If X had a high E, then it would not be able to produce its own 
information. With a high S, X would not be able to adapt to changes in Y. Therefore, we 
propose: 

A=^-B. (10) 

C{Y) ^ ' 

If C{X) = 0, then either X is static {E{X) = 0) or pseudorandom {S{X) = 0). This 
implies that any pattern (complexity) which could be observed in X (if any) should come 
from Y. This case gives a minimal A. On the other hand, if C{Y) = 0, it implies that 
any pattern (if any) in X should come from itself. This case gives a maximal A = oo. A 
particular case occurs if C{X) = and C{Y) = 0. A becomes undefined. But how can we 
say something about autopoiesis if we are comparing two systems which are either without 
variations {S = 1) or pseudorandom [E = 1)? This case should be undefined. The rest of 
the properties are evidently fulfilled by equation 10. This is certainly not the unique function 
to fulfill the desired axioms. The exploration of alternatives requires further study. 

Since A represents a ratio of probabilities, it is a unitless measure. A G [0, oo), although 
it could be mapped to [0, 1) using a function such as f{A) = j^. 
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3.6 Multi-scale profiles 

Bar- Yam (2004) proposed the "complexity profile", which plots the complexity of systems 
depending on the scale at which they are observed. This allows to compare how a measure 
changes with scale. For example, the a profile compares the "satisfaction" of systems at 
different scales to study organization, evolution and cooperation (Gcrshcnson, 2011). 

In a similar way, multi-scale profiles can be used for each of the measures proposed, giving 
further insights about the dynamics of a system than measuring them at a single scale. This 
is clearly seen, for example, with different types of elementary cellular automata (Gershenson 
and Fernandez, 2012). 

4 Results 

In this section we apply the measures proposed in the previous section to two case studies: 
random Boolean networks and an aquatic ecosystem. A further case, elementary cellular 
automata, can be found in Gershenson and Fernandez (2012). 

4.1 Random Boolean Networks 

Results show averages of 1000 RBNs, where 1000 steps were run from a random initial state 
and E, S, C and H were calculated from data generated in 1000 additional steps. 

R (R Project Contributors, 2012) was used with packages BoolNet (Miissel et al., 2010) 
and entropy (Hausser and Strimmer, 2012). 

Figure 3 shows results for RBNs with 100 nodes, as the connectivity K varies. For low 
K, there is high S and H, and a low E and C. This reflects the ordered regime of RBNs, 
where there is high robustness and few changes. Thus, it can be said that there is few or no 
information emerging and there is a high degree of self-organization and homeostasis. For 
high K, there is high E, low S and C, and uncorrelated H ^ 0.5. This reflects the chaotic 
regime of RBNs, where there is high fragility and many changes. Almost every bit (a new 
state for most nodes) carries novel emergent information, and this constant change implies 
low organization and complexity. For medium connectivities {2 < K < 3), there is a balance 
between E and S, leading to a high C. This corresponds to the critical regime of RBNs, 
which has been associated with complexity and the possibility of life (Kauffman, 2000). 

As for autopoiesis, to model a system and its environment, we coupled two RBNs: One 
"internal" RBN with iVj nodes and Ki average connections and one "external" with Ne nodes 
and Kf, average connections. A "coupled" RBN is considered with N^ = Ni + N^ nodes and 
Ki connections. At every time step, the external RBN evolves independently. However, 
its state is copied to the Ng nodes representing it in the coupled RBN, which now evolves 
depending partly on the external RBN. Thus, the Ni nodes in the coupled RBN representing 
the internal RBN may be affected by the dynamics of the external RBN, but not vice versa. 
The C of each node is calculated and averaged separately for each network, obtaining an 
internal complexity Cj and an external complexity Cg. 

Figure 4 and Table 2 show results for N^. = 96 and Ni = 32 for different combinations of 
Kp and Kj. 
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Figure 3: Averages for 1000 RBNs, A^ = 100 nodes and varying average connectivity K 
(Gershenson and Fernandez, 2012). 



Table 2: A averages for 50 sets Ne = 96, Ni = 32. Same results as those shown in Figure 4. 



K,\K, 


1 


2 


3 


4 


5 


1 


0.4464025 


0.5151070 


0.7526248 


1.6460345 


3.4081967 


2 


1.6043330 


0.9586809 


1.1379227 


2.0669794 


3.2473729 


3 


2.4965328 


0.9999926 


0.9355231 


1.3604272 


2.6283798 


4 


2.1476247 


0.7249803 


0.6151742 


0.8055051 


1.3890630 


5 


1.8969094 


0.4760027 


0.3871875 


0.4755580 


0.8648389 
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Figure 4: A averages for 50 sets Ne = 96, Ni = 32. Values A < 1 are red while A > 1 are 
blue. Size of circles indicate how far A is from A = 1. Numerical values shown in Table 2. 
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As it was shown in Figure 3, C changes with K, so it is expected to have A k, \ when 
Ki K. Kg. When Ce is high [Ke = 2 or Ke = 3), then the environment dominates the 
patterns of the system, yielding A < 1. When Ce is low [Ke < 2 or K^ > 3) , the patterns 
produced by the system are not affected that much by its environment, thus A > 1, a,s long 
as Ki < Kf. (otherwise the system is more chaotic that its environment, and so complex 
patterns have to come from outside). 

A does not try to measure how much information emerges internally or externally, but 
how much the patterns are internally or externally produced. A high E means that there 
is no pattern, as there is constant change. A high S implies a static pattern. A high C 
reflects complex patterns. We are interested in A measuring the ratio of the complexity of 
patterns being produced by a system compared to the complexity of patterns produced by 
its environment. 

4.2 An Ecological System: An Arctic Lake 

The data from an Artie lake model used in this section was obtained using The Aquatic 
Ecosystem Simulator (Randerson and Bowker, 2008). 

In general, Arctic lake systems are classified as oligotrophic due to their low primary 
production, represented in chlorophyll values of 0.8-2.1 mg/m3. The lake's water column, or 
limnetic zone, is well-mixed; this means that there are no stratifications (layers with different 
temperatures). During winter (October to March), the surface of the lake is ice covered. 
During summer (April to September), ice melts and the water flow and evaporation increase, 
as shown in Figure 5A. Consequently, the two climatic periods (winter and summer) in the 
Arctic region cause a typical hydro logic behavior in lakes as the one shown in Figure 5B. 
This hydrologic behavior influences the physiochemical subsystem of the lake. 

Table 3 and Figure 6 show the variables and daily data we obtained from the Arctic lake 
simulation. The model used is deterministic, so there is no variation in different simulation 
runs. Figure 6 depicts a higher dispersion for variables such as temperature (T) and light 
(L) at the three zones of the Arctic lake (surface=S', planktonic=P and benthic=-B); Inflow 
and outflow (/&0), retention time (RT) and evaporation (Ev) also have a high dispersion, 
Ev being the variable with the highest dispersion. 

Observing RT and ISzO in logarithmic scale, we can see that their values are located at 
the extremes, but their range is not long. Consequently, these variables have considerable 
variability in a short range. However, the ranges of the other variables do not reflect large 
changes. This situation complicates the interpretation and comparison of the physiochemical 
dynamics. To attend this situation, we normalize the data to base b of all points x of all 
variables X with the following equation: 



fix) 



X — min X 



'IV. 



maxX — minX_ 

where \_x\ is the floor function of x. 

Once all variables are in transformed into a finite alphabet, in this case, base 10 (b = 
10), we can calculate emergence, self-organization, complexity, homeostasis and autopoiesis. 
Figure 7 depicts the number of points in each of the ten classes and shows the distribution 
of the values for each variable. Based on this distribution, the behavior for variables can be 
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Figure 5: (A) Climatic and (B) hydraulic regimes of Arctic lakes. 
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Table 3: Physiochemical variables considered in the Arctic lake model. 



Variable 


Units 


Acronym 


Max 


Min 


Median 


Mean 


std. dev. 


Surface Light 


MJ/m2/day 


SL 


30 


1 


5.1 


11.06 


11.27 


Planktonic Ligth 


MJ/m2/day 


PL 


28.2 


1 


4.9 


10.46 


10.57 


Bcnthic Light 


MJ/m2/day 


BL 


24.9 


0.9 


4.7 


9.34 


9.33 


Surface Temperature 


DegC 


ST 


8.6 





1.5 


3.04 


3.34 


Planktonic Temperature 


DegC 


PT 


8.1 


0.5 


1.4 


3.1 


2.94 


Benthic Temperature 


DegC 


BT 


7.6 


1.6 


2 


3.5 


2.29 


Inflow and Outflow 


m3/sec 


IkO 


13.9 


5.8 


5.8 


8.44 


3.34 


Retention Time 


days 


RT 


100 


41.7 


99.8 


78.75 


25.7 


Evaporation 


m3/day 


Ev 


14325 





2436.4 


5065.94 


5573.99 


Zone Mixing 


%/day 


ZM 


55 


45 


50 


50 


3.54 


Inflow Conductivity 


uS/cm 


ICd 


427 


370.8 


391.4 


396.96 


17.29 


Planktonic Conductivity 


uS/cm 


PCd 


650.1 


547.6 


567.1 


585.25 


38.55 


Benthic Conductivity 


uS/cm 


BCd 


668.4 


560.7 


580.4 


600.32 


40.84 


Surface Oxygen 


mg/Htre 


S02 


14.5 


11.7 


13.9 


13.46 


1.12 


Planktonic Oxygen 


mg/htre 


P02 


13.1 


10.5 


12.6 


12.15 


1.02 


Benthic Oxygen 


mg/htre 


B02 


13 


9.4 


12.5 


11.62 


1.51 


Sediment Oxygen 


mg/litre 


Sd02 


12.9 


8.3 


12.4 


11.1 


2.02 


Inflow pH 


ph Units 


IpH 


6.4 


6 


6.2 


6.2 


0.15 


Planktonic pH 


ph Units 


PpH 


6.7 


6.. 40 


6.6 


6.57 


0.09 


Benthic pH 


ph Units 


BpH 


6.6 


6.4 


6.5 


6.52 


0.07 



easily described and compared. Variables with a more homogeneous distribution will produce 
more information, yielding higher values of emergence. Variables with a more heterogeneous 
distribution will produce higher self-organization values. The complexity of variables is not 
easy to deduce from Figure 7. 

4.2.1 Emergence, Self-organization, and Complexity 

Figure 8 shows the values of emergence, self-organization, and complexity of the physio- 
chemical subsystem. Variables with a high complexity C G [0.8, 1] reflect a balance between 
change/chaos (emergence) and regularity/order (self-organization). This is the case of ben- 
thic and planktonic pH [BpH; PpH), ISzO (Inflow and Outflow) and RT (Retention Time). 
For variables with high emergencies [E > 0.92), like Inflow Conductivity (ICd) and Zone 
Mixing (ZM), their change in time is constant; a necessary condition for exhibiting chaos. 
For the rest of the variables, self-organization values are low {S < 0.32), reflecting low reg- 
ularity. It is interesting to notice that in this system there are no variables with a high 
self-organization nor low emergence. 

Since E,S,C G [0,1], these measures can be categorized into five categories as shown 
in Table 4. These categories are described on the basis of the range value, the color and 
the adjective in a scale from very high to very low. This categorization is inspired on the 
categories for Colombian water pollution indices. These indices were proposed by Ramirez 
et al. (2003) and evaluated in Fernandez et al. (2005). 

Table 5 shows results of E, S, and C using the categories just mentioned. 

From Table 5 and a principal component analysis (not shown), we can divide the values 
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Figure 7: Transformed variables from the physiochemical subsystem to base 10. 



Table 4: Categories for classifying E, S, and C. 



Category 


Very High 


High 


Fair 


Low 


Very Low 


Range 


[1,>0.80] 


0.8, > 0.6 


[0.6, > 0.4 


[0.4, > 0.21] 


0.2, > 0.0] 


Color 




Green 


Yellow 


Orange 
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Table 5: E, S, and C of physiochemical variables of the Arctic lake model. Also shown in 
Figure 8. 



Variable 



Acronym E 



S 



c 



Benthic pH 


BpH 


0.44196793 


0.55803207 


0.98652912 


In and Outflow 


mo 


0.52310253 


0.47689747 


0.99786509 


Retention Time 


RT 


0.53890552 


0.46109448 


0.99394544 


Planktonic pH 


PpH 


0.54122993 


0.45877007 


0.99320037 


Sediment Oxygen 
Benthic Oxygen 


Sd02 
B02 


0.59328705 


0.40671295 


0.96519011 

0.87176542 


0.67904928 


0.32095072 


Inflow pH 


IpH 


0.69570975 


0.30429025 


0.84679077 


Benthic Temperature 


BT 


0.72661539 


0.27338461 


0.79458186 ' 


Planktonic Temperature 


PT 


0.75293885 


0.24706115 


0.74408774 


Planktonic Light 


PL 


0.75582978 


0.24417022 


0.7382045 


Surface Light 


SL 


0.75591484 


0.24408516 


0.73803038 


Benthic Light 


BL 


0.76306133 


0.23693867 


0.72319494 


Surface Oxygen 


S02 


0.76509182 


0.23490818 


0.71890531 


Surface Temperature 


ST 


0.76642121 


0.23357879 


0.71607895 


Evaporation 


Ev 


0.76676234 


0.23323766 


0.71535142 


Planktonic Oxygen 


P02 


0.76887287 


0.23112713 


0.71082953 


Benthic Conductivity 


BCd 


0.77974428 


0.22025572 


0.68697255 


Planktonic Conductivity 
Inflow Conductivity 


PCd 
ICd 


0.78604873 


0.21395127 


0.6727045 


0.92845597 


0.07154403 


0.26570192 


Zone Mixing 


ZM 


0.94809050 


0.0519095 


0.1968596 
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Figure 8: E, S, and C of physiochemical variables of the Arctic lake model. Also shown in 
Table 5. 



obtained in complexity categories as follows: 

Very High Complexity : C G [0.8,1]. The following variables balance self-organization 
and emergence: benthic and planktonic pH {BpH, PpH), inflow and outflow {IhO), 
and retention time {RT). It is remarkable that the increasing of the hydrological regime 
during summer is related in an inverse way with the dissolved oxygen (5*02; B02). 
It means that an increased flow causes oxygen depletion. Benthic Oxygen {B02) and 
Inflow Ph {IpH) show the lowest levels of the category. Between both, there is a 
negative correlation: a doubling of IpH is associated with a decline of B02 in 40%. 

High Complexity : C G [0.6,0.8). This group includes 11 of the 21 variables and involves 
a high E and a low S. These 11 variables that showed more chaotic than ordered 
states are highly influenced by the solar radiation that deflnes the winter and summer 
seasons, as well as the hydrological cycle. These variables were: Oxygen {P02, S02); 
surface, planktonic and benthic temperature {ST, PT, BT); conductivity {ICd, PCd, 
BCd); planktonic and benthic light {PL,BL); and evaporation {Ev). 

Very Low Complexity : C G [0,0.2). In this group, E is high, and S is very low. This 
category includes the inflow conductivity {ICd) and water mixing variance {ZM). 
Both are high and directly correlated; it means that an increase of the mixing per- 
centage between planktonic and benthic zones is associated with an increase of inflow 
conductivity. 
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4.2.2 Homeostasis 

The homeostasis was calculated by comparing the variation of all variables, representing the 
state of the Arctic subsystem every day. The timescale is very important, because H can 
vary considerably if we compare states every minute or every month. 

The h values have a mean [H) of 0.95739726 and a standard deviation of 0.064850247. 
The minimum h is 0.60 and the maximum h is 1.0. In an annual cycle, homeostasis shows 
four different patterns, as shown in Figure 9, which correspond with the seasonal variations 
between winter and summer. These four periods show scattered values of homeostasis as 
the result of transitions between winter and summer and winter back again. The winter 
period (first and last days of the year) has very high h levels (1 or close to 1) and starts 
from day 212 and goes to day 87. In this period, the winter conditions such as low light 
level, temperature, maximum time retention due to ice covering and low inflow and outflow, 
water mixing interchange between planktonic and benthic zones, low conductivities and pHs 
and high oxygen are present. The second, third and fourth periods correspond to summer. 
The second period starts with the increasing of benthic pH, zone mixing, and inflow-outflow 
variables. Between days 83 and 154, this period is characterized for extreme fluctuations as a 
result of an increase in temperature and light. Homeostasis fluctuates and reaches a minimum 
of 0.6 in day 116. At the end of this period, the evaporation and zone mixing increase, while 
the oxygen at benthic and sediment decrease. The third period (days 155 to 162) reflects the 
stabilization of the summer conditions; It means maximum evaporation, temperature, light, 
mixing zone, conductivity and pH and lowest amount of oxygen. Homeostasis is maximal 
again for this period. The fourth period (days 163-211), which has h fluctuations near to 
0.9, corresponds to the transition of summer to winter conditions. 
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Figure 9: Daily variations of homeostais for an Arctic lake during a simulated year. 



As it can be seen, using h, periodic or seasonal dynamics can be followed and studied. 

4.2.3 Autopoiesis 

Autopoiesis was measured for three components (subsystems) at the planktonic and benthic 
zones of the Arctic lake. These were physiochemical, limiting nutrients and biomass. They 
include the variables and organisms related in Table 6. 
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Table 6: Variables and organisms used for calculating 
autopoiesis. 



Component 


Planktonic zone 


Benthic zone 


Physiochemical 


Light, Temperature, Conduc- 
tivity, Oxygen, pH 


Light, Temperature, Conduc- 
tivity, Oxygen, Sediment Oxy- 
gen, pH 


Limiting Nutri- 
ents 


Silicates, Nitrates, Phosphates, 
Carbon Dioxide 


Silicates, Nitrates, Phosphates, 
Carbon Dioxide 


Biomass 


Diatoms, Cyanobacteria, 
Green Algae, Chlorophyta 


Diatoms, Cyanobacteria, 
Green Algae 



According to the complexity categories established in Table 4, the planktonic and benthic 
components have been classified in the following categories: limiting nutrient variables in the 
low complexity category (C G [0.2,0.4); orange color), physiochemical variables in the high 
complexity category (C G [0.6,0.8); green color) and biomass in the very high complexity 
category {C G [0.8, 1]; blue color). A comparison of the complexity level for each subsystem 
of each zone (averaging their respective variables) is depicted in Figure 10. 
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Figure 10: C of planktonic and benthic components 



In order to compare the autonomy of each group of variables, equation 10 was applied 
to the complexity data, as shown in Figure 11. For the planktonic and benthic zones, we 
calculated the autopoiesis of the biomass elements in relation to limiting nutrient and phys- 
iochemical variables. All A values are greater than 1. That means that the variables related 
to living systems have a greater complexity than the variables related to their environment, 
represented by the limiting nutrient and physiochemical variables. While we can say that 
some physiochemical variables, including limiting nutrients have more or fewer effects on the 
planktonic and benthic biomass, we can also estimate that planktonic and benthic biomass 
are more autonomous compared to their physiochemical and nutrient environments. The very 
high values of complexity of biomass imply that these living systems can adapt to the changes 
of their environments because of the balance between emergence and self-organization that 
they have. 

5 Discussion 

5.1 Measures 

The proposed measures characterize the different configurations and dynamics that elements 
of complex systems acquire through their interactions. Just like temperature averages the 
kinetic energy of molecules, much information is lost in the averaging, as the description of 
phenomena changes scale. The measures are probabilistic (except for H) and they all rely 
on statistical samples^. Thus, the caveats of statistics and probability should be taken into 

^This is also the reason for why all measures are unitless. 
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Figure 11: A oi biomass depending on limiting nutrients and physiochemical components. 



consideration when using the proposed measures. Still, these measures capture the properties 
and tendencies of a system, that is why the scale at which they are described is appropriate. 
They will not indicate which element interacted with which element, how and when. If we 
are interested in the properties and tendencies of the elements, we can change scale and 
apply the measures there. Still, we have to be aware that the measures are averaging — and 
thus simplifying — the phenomena they describe. Whether relevant information is lost on the 
averaging depends not only on the phenomenon, but on what kind of information we are 
interested in, i.e. relevance is also partially dependent on the observer (Gershenson, 2002). 

5.2 Complexity as balance or entropy? 

Some approaches relate complexity with a high entropy, i.e. information content (Bar- Yam, 
2004; Delahaye and Zenil, 2007). Just as chaos should not be confused with complexity (Ger- 
shenson, 2013), a very high entropy (high emergence E) implies too much change, where 
complex patterns are destroyed. On the other hand, very low entropy (high self-organization 
S). prevents complex patterns from emerging. As it has been proposed by several authors, 
complexity can be seen as balance between order and disorder (Langton, 1990; Kauffman, 
1993; Lopez-Ruiz et al., 1995), and thus, it is logical to postulate C as a balance of E and 
S. 

It might seem contradictory to define emergence as the opposite of self-organization, as 
they are both present in several complex phenomena. However, when one takes one to the 
extreme (emergence or self-organization), the other is negligible. It is precisely when both of 
them are balanced that complexity occurs, but this does not mean that both of them have 
to be maximal. 
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5.3 Fisher Information 

C is correlated with Fisher information, which has been shown to be related to phase tran- 
sitions (Prokopenko et al., 2011). Following the view of high complexity as a balance, it 
is natural that C is maximal at phase transitions, which is the case for both C and Fisher 
information. However, the steepness of Fisher information is much higher than that of C. It 
is appropriate for determining phase transitions, but it makes little distinction of dynamics 
farther from transitions. C is smoother, so it can represent dynamical change in a more 
gradual fashion. 

5.4 Guided Self-organization 

The measures proposed have several implications for GSO, beyond providing a measure 
of self-organization. In order to guide a complex system, one has to detect what kind 
of dynamical regime it has. Depending on this, and on the desired configuration for the 
system, different interventions can be made (Gcrshcnson, 2012a). The measures can inform 
directly about the dynamical regime and about the effect of the intervention. 

For example, if we want to have a system with a high complexity, first we need to measure 
what is its actual complexity. If it is not the desired one, then the dynamics are guided. 
But we also have to measure the complexity during the guiding process, to evaluate the 
effectiveness of the intervention. 

5.5 Scales 

The proposed measures can be applied at different scales, with drastic outcomes. For ex- 
ample, the string '1010101010' will have E = I in base 2, as P(0) = P(l) = 0.5. However, 
in base 4, each symbol pair is transformed into a single symbol, so the string is transformed 
to '22222', and thus P(2) = 1 and P(0) = P(l) = P(3) = 0, giving E = 0. Which scale(s) 
should be used is a question that has to be decided and justified. Multiscale profiles can be 
helpful in visualizing how the measures change with scale. 

5.6 Normalization 

For treating continuous data, we used equation 11 to normalize to a finite alphabet, which 
is equally distributed. Clustering methods could also be used to process data into finite 
categories. Still, an issue might arise for either case: if the available data does not represent 
the total range of possible values of a variable, e.g. data G (4.5, 5.5) but the variable G (0, 10). 
If we consider h = 10, then equation 11 would produce ten categories for the available 
data, which might be homogeneously distributed and this give a high E. However, if we 
considered the variable range for equation 11, it would categorize the available data in only 
two categories, leading to a low E. This problem is similar to the one of scales. We suggest 
to use the viability zone of a variable when known to normalize variables. 
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6 Conclusions 

We reviewed measures of emergence, self-organization, complexity, homeostasis, and au- 
topoiesis based on information theory. Axioms were postulated for each measure and equa- 
tions were derived form them. Having in mind that there are several different measures 
already proposed (Prokopenko et al., 2009; Gershenson and Fernandez, 2012), this approach 
allows us to evaluate the axioms underlying the measures, as opposed to trying to compare 
different measures without a common ground. 

The generality and usefulness of the proposed measures will be evaluated gradually, as 
these are applied to different systems. These can be abstract (e.g. Turing machines (Delahaye 
and Zenil, 2007, 2012), e-machines (Shalizi and Crutchfield, 2001; Gornerup and Crutchfield, 
2008)), biological (ecosystems, organisms), economic, social or technological (Helbing, 2011). 

The potential benefits of general measures as the ones proposed here are manifold. Even 
if with time more appropriate measures are found, aiming at the goal of finding general 
measures which can characterize complexity, emergence, self-organization, homeostasis, au- 
topoiesis, and related concepts for any observable system is a necessary step to take. 
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